borderline sample
OASIS: Open-world Adaptive Self-supervised and Imbalanced-aware System
Kim, Miru, Joe, Mugon, Kwon, Minhae
The expansion of machine learning into dynamic environments presents challenges in handling open-world problems where label shift, covariate shift, and unknown classes emerge. Post-training methods have been explored to address these challenges, adapting models to newly emerging data. However, these methods struggle when the initial pre-training is performed on class-imbalanced datasets, limiting generalization to minority classes. To address this, we propose a method that effectively handles open-world problems even when pre-training is conducted on imbalanced data. Our contrastive-based pre-training approach enhances classification performance, particularly for underrepresented classes. Our post-training mechanism generates reliable pseudo-labels, improving model robustness against open-world problems. We also introduce selective activation criteria to optimize the post-training process, reducing unnecessary computation. Extensive experiments demonstrate that our method significantly outperforms state-of-the-art adaptation techniques in both accuracy and efficiency across diverse open-world scenarios.
Approximate Borderline Sampling using Granular-Ball for Classification Tasks
Xie, Qin, Zhang, Qinghua, Xia, Shuyin
Chongqing Key Laboratory of Computational Intelligence Chongqing University of Posts and T elecommunications Chongqing, China d210201029@stu.cqupt.edu.cn Chongqing Key Laboratory of Computational Intelligence Chongqing University of Posts and T elecommunications Chongqing, China zhangqh@cqupt.edu.cn Chongqing Key Laboratory of Computational Intelligence Chongqing University of Posts and T elecommunications Chongqing, China xiasy@cqupt.edu.cn Abstract --Data sampling enhances classifier efficiency and robustness through data compression and quality improvement. Recently, the sampling method based on granular-ball (GB) has shown promising performance in generality and noisy classification tasks. However, some limitations remain, including the absence of borderline sampling strategies and issues with class boundary blurring or shrinking due to overlap between GBs. In this paper, an approximate borderline sampling method using GBs is proposed for classification tasks. First, a restricted diffusion-based GB generation (RD-GBG) method is proposed, which prevents GB overlaps by constrained expansion, preserving precise geometric representation of GBs via redefined ones. Second, based on the concept of heterogeneous nearest neighbor, a GB-based approximate borderline sampling (GBABS) method is proposed, which is the first general sampling method capable of both borderline sampling and improving the quality of class noise datasets. Additionally, since RD-GBG incorporates noise detection and GBABS focuses on borderline samples, GBABS performs outstandingly on class noise datasets without the need for an optimal purity threshold. Experimental results demonstrate that the proposed methods outperform the GB-based sampling method and several representative sampling methods. Data sampling plays a pivotal role in supervised machine learning, particularly for classification tasks. It offers a multitude of benefits, including reduced computational complexity, balanced class distributions, diminished effects of noise and outliers, alleviation of overfitting, and enhanced model inter-pretability.
Distribution-Informed Adaptation for kNN Graph Construction
Graph-based kNN algorithms have garnered widespread popularity for machine learning tasks due to their simplicity and effectiveness. However, as factual data often inherit complex distributions, the conventional kNN graph's reliance on a unified k-value can hinder its performance. A crucial factor behind this challenge is the presence of ambiguous samples along decision boundaries that are inevitably more prone to incorrect classifications. To address the situation, we propose the Distribution-Informed adaptive kNN Graph (DaNNG), which combines adaptive kNN with distribution-aware graph construction. By incorporating an approximation of the distribution with customized k-adaption criteria, DaNNG can significantly improve performance on ambiguous samples, and hence enhance overall accuracy and generalization capability. Through rigorous evaluations on diverse benchmark datasets, DaNNG outperforms state-of-the-art algorithms, showcasing its adaptability and efficacy across various real-world scenarios.
Characterizing the Decision Boundary of Deep Neural Networks
Karimi, Hamid, Derr, Tyler, Tang, Jiliang
Deep neural networks and in particular, deep neural classifiers have become an integral part of many modern applications. Despite their practical success, we still have limited knowledge of how they work and the demand for such an understanding is evergrowing. In this regard, one crucial aspect of deep neural network classifiers that can help us deepen our knowledge about their decision-making behavior is to investigate their decision boundaries. Nevertheless, this is contingent upon having access to samples populating the areas near the decision boundary. To achieve this, we propose a novel approach we call Deep Decision boundary Instance Generation (DeepDIG). DeepDIG utilizes a method based on adversarial example generation as an effective way of generating samples near the decision boundary of any deep neural network model. Then, we introduce a set of important principled characteristics that take advantage of the generated instances near the decision boundary to provide multifaceted understandings of deep neural networks. We have performed extensive experiments on multiple representative datasets across various deep neural network models and characterized their decision boundaries.